Advances in Computers, Volume 112 by Unknown
Author:Unknown
Language: eng
Format: epub
ARTICLE IN PRESS
Data Warehouse Testing
31
Table 8 Examples of Achilles Data Quality Rules
Rule_id Data Quality Rule
Status
Description
19
Year of birth should not be Warning
Checks whether or not year
prior to 1800
of birth is less than 1800
32
Percentage of patients with Notification Checks whether or not the
no visits should not exceed
percentage of patients that
a threshold value
have no visit records is greater
than 5
need of the health data users. Achilles defines its data quality rules as SQL
queries while PEDSnet uses R. Table 8 shows examples of two data quality
rules that are validated in Achilles.
Note that the tools described in Section 4.1.1 to test the quality of the
underlying data in a data warehousing system can also be used to execute
data quality tests for the ETL process. The difference is that in the context
of ETL testing, the tools have a different purpose, which is to test any time
data is added or modified through the ETL process.
5.1.2 Balancing Tests
Balancing tests ensure that the data obtained from the source databases is not
lost or incorrectly modified by the ETL process. In this testing activity, data
in the source and target data warehouse are analyzed and differences are
reported.
The balancing approach called Sampling [65] uses source-to-target map-
ping documents to extract data from both the source and target tables and
store them in two spreadsheets. Then it uses the Stare and Compare technique
to manually verify data and determine differences through viewing or eye-
balling the data. Since this task can involve the comparison of billions of
records, most of the time, a few number of the entire set of records are ver-
ified through this approach.
IBM QuerySurge [65] is a commercial tool that was built specifically to
automate the balancing tests through query wizards. The tool implements a
method for fast comparison of validation query results written by testers [66].
The query wizards implement an interface to make sure that minimal effort
and no programming skills are required for developing balancing tests and
obtaining results. The tool compares data based on column, table, and record
count properties. Testers select the tables and columns to be compared in the
wizard. The problem with this tool is that it only compares data that is not
ARTICLE IN PRESS
32
Hajar Homayouni et al.
modified during the ETL transformation, which is claimed to be 80% of
data. However, the goal of ETL testing should also be to validate data that
has been reformatted and modified through the ETL process.
Another method is Minus Queries [65] in which the difference between
the source and target is determined by subtracting the target data from the
source data to show existence of unbalanced data. The problem with this
method is the potential for false positives. For example, as many data ware-
houses keep historical data, there may be duplicate records in the target data
warehouse corresponding to the same entity and the result might report an
error based on the differences in number of records in the source and the
target data warehouse. However, these duplications are actually allowed
in the target data warehouse.
We propose to identify discrepancies that may arise between the source
and the target data due to an incorrect transformation process. Based on
these discrepancies we define a set of properties, namely, completeness, con-
sistency, and syntactic validity.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
The Brazilian Economy since the Great Financial Crisis of 20072008 by Philip Arestis Carolina Troncoso Baltar & Daniela Magalhães Prates(310375)
International Integration of the Brazilian Economy by Elias C. Grivoyannis(111321)
The Art of Coaching by Elena Aguilar(53423)
Flexible Working by Dale Gemma;(23322)
How to Stop Living Paycheck to Paycheck by Avery Breyer(19779)
Thinking, Fast and Slow by Kahneman Daniel(12424)
The Acquirer's Multiple: How the Billionaire Contrarians of Deep Value Beat the Market by Tobias Carlisle(12379)
The Radium Girls by Kate Moore(12096)
The Art of Thinking Clearly by Rolf Dobelli(10605)
Hit Refresh by Satya Nadella(9197)
The Compound Effect by Darren Hardy(9059)
Tools of Titans by Timothy Ferriss(8497)
Atomic Habits: Tiny Changes, Remarkable Results by James Clear(8424)
Turbulence by E. J. Noyes(8120)
A Court of Wings and Ruin by Sarah J. Maas(7955)
Change Your Questions, Change Your Life by Marilee Adams(7850)
Nudge - Improving Decisions about Health, Wealth, and Happiness by Thaler Sunstein(7760)
How to Be a Bawse: A Guide to Conquering Life by Lilly Singh(7550)
Win Bigly by Scott Adams(7269)